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i.  INTRODUCTION 

A  Hidden  Markov  Model  (HMM)  can  be  considered  a  state  machine  in  which 
state  transitions  and  state  outputs,  or  observations,  are  probabilistic.  HMM’s  are  used  to 
learn  and  classify  sequences  of  observables.  HMM  technology  has  been  used 
successfully  in  a  diverse  set  of  applications,  such  as  speech  recognition  [Da,  Pi],  Gene 
prediction  [Ra],  and  Cryptanalysis  [Si]. 

Because  of  the  probabilistic  nature  of  the  underlying  process  being  observed  by 
HMM’s,  they  are  not  used  often  to  recognize  long-periodic  sequences.  Rather,  they  are 
mostly  used  as  discriminators,  to  determine  whether  one  HMM  is  better  than  another.  For 
example,  an  HMM-based  speech  recognition  system  may  have  each  HMM  represent  a 
word,  with  run  time  voice  recognition  choosing  the  HMM  that  best  fits  the  incoming 
sequence  of  speech  features. 

This  is  in  contrast  with  Deterministic  Finite  Automata  (DFA)  [HWU],  Finite 
State  Machines  (FSM’s)  [KJ],  or  Harel-Statecharts  [Ha,  Dl,  D2],  which  are  often  used  to 
identify  and  classify  individual  sequences.  Stated  differently,  because  HMM’s  identify 
individual  sequences  of  external  observables  with  a  relatively  low  probability,  it  is 
usually  not  perceived  as  convincing  evidence  of  the  occurrence  of  a  particular  sequence. 

Run-time  Verification  (RV)  of  formal  specification  assertions  (RV),  also  known 
as  Run-time  Execution  Monitoring  (REM),  is  a  class  of  methods  for  monitoring  the 
sequencing  and  temporal  behavior  of  an  underlying  application  and  comparing  it  to  the 
correct  behavior  as  specified  by  a  formal  specification. 

Some  published  RV  tools  and  techniques  are:  the  TemporalRover/DBRover  [D3],  PaX 
[HR]  and  RT-Mac  [SLS],  all  of  which  use  extensions  and  variants  of  Propositional 
Linear-time  Temporal  Logic  (PLTL)  as  the  specification  language  of  choice,  and  the 
StateRover  [SR]  that  uses  deterministic  and  non-deterministic  statechart  diagrams  as  its 
specification  language.  In  [D2],  Drusinsky  describes  the  application  of  RV  using 
statechart  assertions  to  the  verification  of  DoD  and  NASA  applications,  and  to  those  of 
the  Brazilian  Space  agency 

Execution-based  Model  Checking  (EMC)  is  a  combination  of  RV  and  Automatic 
Test  Generation  (ATG).  With  EMC,  a  large  volume  of  automatically  generated  tests  are 
used  to  exercise  the  program  or  System  Under  Test  (SUT),  using  RV  on  the  other  end  to 
check  the  SUT’s  conformance  to  the  formal  specification.  Some  ATG  tools  that,  when 
combined  with  RV  tools,  create  an  EMC  technique  are  the  StateRover’ s  white-box 
automatic  test-generator  [SR]  and  NASA’s  Java  Path  Finder  (JPF)  [HP], 

Runtime  Monitoring  (RM)  is  a  technique  for  monitoring  system  behavior  with 
respect  to  formally  specified  properties,  but  for  purposes  other  than  verification,  such  as 
performance  or  statistical  analysis.  In  the  remainder  of  this  paper  we  refer  to  RV  as  the 
union  of  RV  and  RM. 

In  [DMS],  the  authors  present  a  visual  tradeoff  space,  called  the  Formal 
Validation  and  Verification  (FV&V)  tradeoff  cuboid,  which  qualitatively  compares  three 
categories  of  FV&V  techniques:  Model  Checking  (MC),  Theorem  Proving  (TP),  and  RV 
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combined  with  automatic  Test  Generation  (ATG).  The  tradeoff  space  compares  the  cost 
and  test-space  coverage  associated  with  these  three  categories  of  techniques.  This 
tradeoff  space  highlights  the  wide  spectrum  of  systems  for  which  RV  has  a  favorable 
cost-performance  ratio. 

In  this  paper,  we  use  HMM’s  to  identify  hidden  events  and  sequences  thereof,  for 
the  purpose  of  subsequent  RV.  We  will  not  be  using  the  (rather  small)  probability  of  an 
observable  sequence,  but  rather  the  probability  of  a  hidden  state  being  reached  given  a 
sequence  of  observables.  Hence,  the  technique  identifies  hidden  events  with  a  relatively 
high  probability. 

This  paper  describes  an  extended  RV  technique  suitable  for  systems  in  which  not 
all  artifacts  are  necessarily  observable.  The  technique  is  a  novel  combination  of  Hidden 
Markov  Models  (HMM’s)  with  probabilistic  RV  of  formal  specification  assertions. 
Throughout  the  paper,  we  will  be  using  the  Statechart  assertion  formal  specification 
language  of  [Dl,  D2],  We  will  show  a  probabilistic  variant  of  this  formalism  suitable  for 
RV  of  systems  with  hidden  inputs. 

Our  proposed  technique  is  suitable  for  the  verification  of  complex  systems  in 
which  visible  data  does  not  necessarily  contain  all  the  information  required  for 
monitoring  the  systems  health  or  for  verifying  its  behavior,  as  in  the  case  of  telemetry 
files  of  space  missions.  It  is  also  suitable  for  monitoring  the  behavior  of  systems  that  are 
not  fully  accessible,  such  as  a  nuclear  facility  or  distant  unmanned  vehicle,  and  for 
forensic  applications,  such  behavioral  analysis  of  a  post-accident  aircraft  or  automotive 
system  using  black-box  information. 

The  rest  of  the  paper  is  organized  as  follows.  Section  2  provides  an  overview  of 
RV  using  UML-based  statechart  assertions.  Section  3  provides  an  overview  of  HMM’s 
and  HMM  related  algorithms.  Section  4  describes  our  proposed  extended-RV architecture 
and  process  that  uses  a  combination  of  hidden  and  visible  data,  using  an  HMM  connected 
to  a  special  formal  specifications  monitor.  Sections  5,  6  and  8  provide  specific  details  of 
the  two  key  components  of  this  process:  section  5  describes  the  HMM  component, 
section  6  describes  the  operation  of  the  formal  specifications  monitor,  and  section  8 
describes  three  techniques  for  computing  the  probability  distribution  used  by  that 
monitor.  While  sections  5  and  6  focus  on  formal  specification  assertions  with  hidden  data 
-  manifested  as  UML  statechart  conditions,  section  7  extends  the  technique  to  formal 
specification  assertions  with  hidden  events.  Section  9  extends  the  technique  to  assertions 
with  hidden  continuous  data.  Finally,  section  10  compares  our  suggested  extended-RV 
architecture  with  two  alternative  architectures. 

2.  RV  OF  (DETERMINISTIC)  FORMAL  SPECIFICATION 
ASSERTIONS  -  AN  OVERVIEW 

Runtime  Verification  (RV)  is  a  light-weight  formal  verification  technique  in  which  the 
runtime  execution  of  a  system  is  monitored  and  compared  to  an  executable  version  of  the 
system’s  formal  specification.  In  other  words,  RV  behaves  as  an  automated  observer  of 
the  program’s  behavior  and  compares  that  behavior  with  the  expected  behavior  per  the 
formal  specification. 
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The  following  formal  specification  example  will  be  used  throughout  the  rest  of 
the  paper. 

Consider  the  following  Traffic  Light  Controller  (TLC)  requirement  Rl:  whenever 
vehicle  speed  in  the  Main  direction  is  greater  than  42km/h  for  more  than  2  consecutive 
minutes  while  lights  in  the  Main  direction  are  green,  then  lights  in  that  direction  should 
turn  red  within  30  seconds  afterwards. 

Figure  1  depicts  a  statechart-assertion  for  Rl.  As  described  in  [D1,D2],  a 
statechart-assertion  is  a  UML  state-machine  augmented  with  a  Java  action  language  and  a 
built  in  Boolean  flag  named  bSuccess,  whose  value  indicates  whether  the  assertion  is 
succeeding  (e.g.,  the  input  scenario  conforms  to  Rl)  or  failing  (e.g.,  the  input  requirement 
violates  Rl). 

The  statechart-assertion  of  Fig.  1  starts-up  in  the  top-level  Init  state.  When  lights 
turn  green  (lightsT urnedGreen  event)  it  transitions  to  the  Init  state  of  the  OnGoing  sub¬ 
state  of  the  Green  super-state,  where  it  polls  until  the  Speed  variable  becomes  HIGH 
(using  a  1Hz  clock  tick  event  named  clockTick)\  the  assertion  then  transitions  to  the 
SpeedHigh  state.  It  then  polls  for  Speed  to  become  non-HIGH  within  2  minutes.  If  Speed 
value  is  or  becomes  not  HIGH  then  the  assertion  waits  in  Green.  OnGoing.Init  until  Speed 
turns  HIGH  again.  If  two  minutes  have  elapsed  then  the  assertion  waits  for  an  additional 
30  seconds,  during  which  it  checks  whether  lights  have  turned  red  as  required.  If  so,  then 
the  process  restarts  in  the  top-level  Init  state.  Otherwise,  Rl  has  been  violated  and  the 
assertion  transition’s  to  the  Error  state  where  it  sets  the  bSuccess  flag  to  false.  This  flag 
indicates  that  the  assertion  has  failed. 
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Figure  1.  A  statechart-assertion  for  requirement  Rl. 

Fig.  2  illustrates  the  conventional  RV  architecture:  an  executable  formal 
specification  assertion  observes  inputs  and  outputs  of  the  SUT  (the  TLC  in  our  example), 
and  compares  those  sequences  to  the  expected  behavior;  whenever  that  actual  behavior 
violates  the  requirement  the  specification  announces  a  failure. 
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/Fail 

Figure  2.  The  RV  architecture  for  the  TLC  and  requirement  Rl. 

Fig.  3  depicts  two  timeline  diagrams  of  validation  tests  for  the  assertion  of  Fig.  1, 
i.e.,  tests  that  assure  the  statechart-assertion  correctly  implements  the  natural  language 
requirement  Rl.  Fig.  3a  depicts  a  test  scenario  that  conforms  to  Rl  -  checking  that  the 
assertion  succeeds  for  this  scenario,  as  expected.  Fig.  3b  depicts  a  test  scenario  that 
violates  Rl  -  checking  that  the  assertion  fails  for  this  scenario,  as  expected. 

Validation  testing  is  an  important  step  in  the  process  because  the  formal- 
specification  assertion  is  to  be  trusted  to  represent  requirement  Rl  in  the  subsequent 
automated  verification  phase,  discussed  below1. 
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a.  Timeline  diagram  for  validation  test  Testl. 
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b.  Timeline  diagram  for  validation  test  Test2.  Rl  is  violated  by  this  scenario  (as 
indicated  by  the  JUNit  Assert  False  arrow)  because  Speed  is  HIGH  for  more  than  two 
minutes  while  lights  are  green,  yet  lights  didn’t  turn  red  as  required. 

Figure  3.  Timeline  diagrams  for  two  validation  tests  for  the  statechart-assertion  of 

Fig.  1. 

Verification  is  performed  by  comparing  a  trace  of  the  system  (e.g.,  as  captured  by 
a  log  file)  to  the  behavior  of  the  assertion  set.  The  StateRover  tool  does  so  using  a  two 

1  Further  details  about  validation  testing  is  available  in  [D2], 
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step  process.  First,  the  log  file  is  converted  into  an  equivalent  JUnit  test  [JU],  and  the 
assertion  is  code-generated  into  an  equivalent  Java  class  (details  about  this  code  generator 
are  available  in  [Dl]).  Next  comes  the  RV  step,  the  JUnit  test  is  executed,  thereby 
checking  that  the  log-file  trace  conforms  to  the  requirement  as  manifested  by  the 
assertion. 

The  extended-RV  technique  suggested  in  this  paper  uses  the  same  process  for  the 
development  and  validation  of  assertions,  i.e.,  assertions  are  developed  as  deterministic 
assertions.  However,  rather  than  performing  deterministic  RV  by  the  virtue  of  using  an 
assertion  code  generator  that  generates  a  deterministic  implementation,  our  technique 
performs  probabilistic  RV  using  a  special  assertion  code  generator  that  generates  a 
probabilistic,  weighted  implementation.  Specific  details  are  provided  in  section  6. 

3.  HIDDEN  MARKOV  MODELS 

A  (discrete)  hidden  Markov  model  (HMM)  is  a  statistical  Markov  model  in  which 
the  system  being  modeled  is  assumed  to  be  a  Markov  process  with  unobserved,  or  hidden 
states,  while  in  a  regular  Markov  model,  the  state  is  directly  visible  to  the  observer,  in  a 
hidden  Markov  model  the  state  is  not  directly  visible,  while  the  output,  dependent  on  the 
state,  is  visible. 

The  parameters  of  a  simple  HMM  are  [Ra]: 

•  N,  the  number  of  states  in  the  model.  Individual  states  are  denoted  S  =  {si,  S2,...s,v}, 
and  the  state  at  time  t  as  qt. 

•  M,  the  number  of  distinct  observation  symbols.  Individual  states  are  denoted  V  =  {vi, 
V2,...Vm}- 

•  The  state  transition  probability  distribution  A  =  {a^}  where  a,j  =  P[qt+ 1  =  Sj\qt  =  sj,  1  < 
i,j  <  N.  Clearly,  Vi,  1  <i<N,  Zi</<v  <%•=  1. 

•  The  observation  symbol  probability  distribution  in  state  j,  B={bj(k)},  where  bfk)  = 
P[vk  at  t\qt  =  Sj],  1<  j<N,  1  <k<  M. 

•  The  initial  state  distribution  n  =  {x,},  where  7i,  =  P[q i  =  s,],  1  <i<N. 

Rabiner  [Ra]  describes  the  following  three  primary  problems  associated  with 
HMM’s: 

1.  Given  the  observation  sequence  O  =  O1O2...OT,  and  an  HMM  model  X  =  (A,B,  n), 
how  do  we  efficiently  compute  P(0|V)? 

2.  Given  the  observation  sequence  O  =  O1O2...OT,  and  an  HMM  model  X  =  (A,B,  71), 
how  do  we  choose  an  optimal  state  sequence  Q  =  q\  qi-.qfl 

3.  How  do  we  calculate  the  model  parameters  X  =  (A,B,  n)  to  maximize  P(O  A)? 

The  most  well  known  algorithms  used  to  solve  these  problems  are: 

1.  The  forward  algorithm,  for  calculating  the  forward  variable  a t(i)  =  P(0\02  -  0h  q,  = 
s,  |  X).  The  forward  algorithm  is  a  dynamic  programming  algorithm  based  on  the 
recurrence: 

oq+1  (J)  =  [X/=i.jvOt  O' My  ]  0(0,+ 1),  \<t<T-\,  1  <j<N, 
with  the  initialization: 
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OL\(j)  =  njbj{0\). 

Note  that  ffOi  02 . . .  0/ 1  X)=Zi=  1 .  .,vOC/(z ) . 
a  is  the  normalized  version  of  a: 

a  t(j)=P(qt=Si\0i02...0t,  X),  calculated  recursively  as: 

at+l(j)=at+i(j)/P(Ol02...Ot\X). 

2.  The  backward  algorithm,  for  calculating  the  backward  variable  pt(z)  = 
P(0t+i0;+2...0T  \c/i  =  Sj,  X).  The  algorithm  is  a  dynamic  programming  algorithm  based 
on  the  recurrence: 

P/O')  =  £/=i..v  ay  bj(0,+\)  p/+iO),  for 
t=T-\,T-!,...,\,  and  1  <i<N, 
with  the  initialization: 

PtO)  =  1,  for  l<i <N. 

3.  The  forward-backward  algorithm,  for  calculating  the  forward-backward  variable 

yt0')=  P{q,  =  Si  |  Oi...Or,  X). 
y  is  also: 

Yt0)=(a/(0  P/0))/  /5(0,02...07|a) 

Y  can  also  be  expressed  as: 

Yt(0=X/</<v^/(y)  where: 

f(i,j)  =  (a tip)  ay  bj(Ot+\)  P;+i0))//3(OiO2...Or|A). 

4.  The  Viterbi  algorithm,  for  calculating  the  best  state  sequence  that  explains  an 
observation  sequence,  8t(0i02...07’  |  X).  The  algorithm  defines: 

8t0)=max[^,^2,-  qt-i ]  P(qi,q2,-qt=Si,  0i02...0f|  X), 
and  uses  the  following  recursive  formula: 

8/0)  =  maxi<i<A/  [8/.1O)  ay]  bfO,) 

along  with  the  following  formula,  used  to  recover  the  actual  most  probable 
state  sequence: 

\\)  t(j )  =  argmaxi<  j<iV  [8 ,_,(/)  ay],  where  \\i  i(j)=0; 

The  Viterbi  algorithm  is  essentially  the  forward  algorithm  with  a  recurrence  in 
which  a  max  operator  is  used  instead  of  the  sum.  The  probability  of  best  state 
sequence  8t(0|02...07’  |  X)  is  then  the  maximal  8  t0),  1  <  i  <  N,  and  qT=  argmax,  8 
i  ( / ).  1  <  i  <  N. 

The  most  probable  state  sequence  qi,q2,...qT  is  calculated  in  a  backward 
manner,  using  q,.\  =  ip  ,(q,). 

4.  RV  OF  SYSTEMS  WITH  HIDDEN  STATES 

Suppose  our  TLC  is  being  monitored  or  verified.  Suppose  also  that,  as  assumed 
by  the  statechart-assertion  of  Figure  1,  it  emits  3  color  change  events:  ( lightTurnedRed 
lightTurnedGreen,  and  lightTurnedYellow),  but  it  not  have  a  Speed  input  or  output. 
Instead,  the  TLC  has  input  sensors  that  measure  the  frequency  of  cars  going  through  the 
junction  in  a  particular  direction  (e.g.,  in  the  Main  direction).  In  other  words,  frequency  is 
an  observable  whereas  speed  is  a  hidden  artifact. 
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To  enable  RV  of  the  TLC  with  respect  to  R1  and  its  corresponding  statechart- 
assertion,  we  modify  the  architecture  of  Fig.  2  as  depicted  in  Fig.  4.  This  architecture 
differs  from  the  conventional  RV  architecture  of  Fig.  1  in  three  main  aspects: 

1.  It  contains  a  Hidden  Markov  Model  (HMM),  used  to  decode  the  probability  of 
occurrence  of  sequences  of  hidden  Speed  states  given  sequences  of  the  frequency 
observable.  This  HMM  provides  a  plurality  of  weighted  Speed  inputs  to  the 
statechart-assertion,  instead  of  a  unique  un-weighted  Speed  input  used  in  Fig.  1. 
Detailed  of  the  HMM  are  discussed  below. 

2.  It  uses  a  special  code  generator  that  generates  a  probabilistic  implementation  for  the 
statechart  assertion(s),  one  that  operates  on  the  weighted  inputs  from  the  HMM. 

3.  It  evaluates  the  assertion  using  a  success  score  in  the  range  [0,1], 


Figure  4.  The  RV  architecture  for  the  TLC  and  requirement  R1  when  the  Speed 

input  is  hidden. 

In  our  example,  visible  frequency  measurement  pertains  to  a  sensor  under  the 
Main  Street  that  measures  the  frequency  of  cars  driving  over  the  sensor.  The  sensor 
produces  symbols,//,/?,  where  /  represents  a  measured  frequency  in  the  range  of  (d- 
1  ,d]  cars  per  second,  for  all  d  -.  Loosely  speaking,  using  a  4  meter  per  car  metric 
(including  car  to  car  spacing),  Speed  =  \4.4*fkm/h.  We  categorize  3  ranges  of  speeds  for 
cars  going  over  the  sensor,  as  follows:  (i)  HIGH:  cars  speed  is  above  40  km/h,  (ii)  LOW: 
car  speeds  below  15  km/h,  and  (iii)  MED:  for  all  other  possibilities. 

While  we  could  use  the  above-mentioned  stationary  process  do  deduce  the  hidden 
Speed  value-range  from  the  visible  frequency  measurement,  it  does  not  account  for 
dynamic  aspect  of  the  system.  First,  it  does  not  account  for  the  fact  that  distances 
between  cars  change  with  car-speed,  rendering  the  4  meters  per  car  estimate  inaccurate. 
Also,  it  is  expected  for  Speed  to  seldom  change  from  HIGH  to  LOW  directly. 

Consequently,  we  use  an  HMM  to  model  this  random  process.  Figure  5  depicts  an 
HMM  for  the  TLC  example.  Its  parameters  are: 

•  The  state  set  Q  consists  of  three  states  that  correspond  with  Speed,  namely,  HIGH, 
MED,  and  LOW,  also  denoted  as  states  0,  1,  and  2,  respectively.  Note  that  it  is  not  a 
coincident  that  the  HMM  states  capture  the  hidden  variable  in  the  assertion  of  Fig.  1 ; 
we  will  discuss  this  relationship  in  section  5. 

•  An  observable  O,  which  takes  on  one  of  the  fd  symbols  discussed  earlier. 

2  We  assume  that  frequencies  above  5  cars/sec  are  measured  as  5  cars/sec. 
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•  Transition  probabilities  are  indicated  along  the  edges  of  Fig.  5. 


Figure  5.  Speed  random  variable  HMM  states  and  transition. 

•  bs(0),  the  probability  of  an  observable  O  being  observed  in  state  s,  is  listed  in  Table 

1. 


0\state 

HIGH 

MED 

LOW 

fi 

0.02 

0.18 

0.63 

f2 

0.22 

0.53 

0.26 

fs 

0.47 

0.17 

0.11 

ft 

0.2 

0.11 

0 

fs 

0.1 

0.01 

0 

Table  1.  Probability  of  observation  O  in  TLC  state  5 

•  The  initial  state  distribution  is  [0.3,  0.5,  0.2]  for  HIGH,  MED,  and  LOW, 
respectively. 

RV  now  proceeds  according  to  the  process  illustrated  in  Fig.  4,  as  follows. 
Sampled  frequency  values  are  periodically  fed  into  the  HMM,  which  then  executes  a 
probability  estimation  algorithm,  such  as  the  forward-algorithm  for  the  current  iteration 
(section  8  discusses  three  probability  estimation  techniques).  These  probability  values 
represent  probabilities  of  the  HMM  being  in  states  HIGH,  MED,  and  LOW,  respectively. 
This  vector  of  symbols  and  corresponding  probabilities  is  passed  to  the  assertion’s 
implementation  code,  which  executes  a  weighted  version  of  a  state-machine  state  change, 
detailed  in  section  6.  Finally,  as  discussed  in  section  6,  the  assertion  announces  the 
probability  it  detected  a  requirement  violation. 

A  more  realistic  HMM  for  deducing  car  speed  is  one  in  which  the  observable 
frequency  is  a  continuous  random  variable  (called  Frequency),  e.g.,  with  a  normal 
distribution  whose  probability  density  function  (PDF)  is  /o(o)~N(p,c  ),  rather  than  a 
Categorical  distribution  (as  the  case  for  TLC -example,  whose  distribution  is  listed  in 
Table  2).  Using  the  TLC  example  again,  the  probability  estimation  algorithm  of  choice 
(elaborated  in  section  8)  will  use  fFrequency{frequency,  j ),  the  Frequency  PDF  in  state  j, 
instead  of  bjifrequency). 


State 

IGH 

ED 

OW 

p  (cars/sec) 

3.125 

2 

0.55 

a  (cars/sec) 

0.35 

1 

0.50 

Table  2.  Normal  distribution  parameters  of  observation  O  in  TLC  states. 
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5.  FROM  ASSERTIONS  TO  HMM  PARAMETER  ESTIMATION 


HMM  parameter  estimation,  i.e.,  estimating  the  transition  probability  and 
probability  of  state  observations,  is  a  difficult  problem.  In  particular,  it  is  difficult  to 
estimate  the  number  of  HMM  states,  the  extreme  cases  being  using  one  state  (i.e., 
reducing  the  HMM  to  a  stationary  process)  or  n  states,  n  being  the  length  of  the 
observation  sequence. 

In  our  case  however,  HMM  states  are  known;  they  are  directly  related  to  the 
hidden  artifacts  in  the  assertions.  For  example,  in  the  TLC  case,  the  three  hidden  symbols 
pertain  to  Speed  values  HIGH,  MED,  LOW,  which  are  derived  from  Fig.  1  and  its 
requirement  Rl,  as  well  as  from  an  assertion  for  the  following  requirement: 

R2:  if  vehicle  speed  in  the  Main  direction  is  between  15  and  30  km/h  for  more 
than  2  consecutive  minutes  while  lights  in  the  Main  direction  are  green,  then  lights 
should  remain  green  for  a  total  of  4  minutes. 


Fig.  6a  depicts  a  statechart  assertion  for  requirement  R2,  and  Fig.  6b  depicts  a 
timeline  diagram  for  a  validation  test  for  this  assertion. 
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a.  Statechart  assertion  and  validation  test  for  requirement  R2 


Timeline  Segment  from  0  to  715 
I  Speed=a.LOW 


I  Speed-a.MED  Speed=a.LOW  I  Speed=a.MED  Speed=a.LOW 


lightsTurnedGreenO  AssertTrue  lightsTurnedRedO  lightsTurnedGreenO  Assert  False 

lightsTurnedYellowO  lightsTurnedRedO  lightsTurnedGreenO  AssertTrue  lightsTurnedRedO 


190  440  440 


1 


485  715  715 


b.  A  timeline  diagram  of  a  validation  test  for  the  statechart-assertion  of  (a) 
Figure  6.  Statechart  assertion  and  validation  test  for  requirement  R2. 


Our  use-case  for  HMM’s  is  simpler  than  usual  in  one  additional  aspect: 
calculating  transition  and  observable  probabilities.  Because  HMM  states  relate  to  real 
world  artifacts  (e.g.,  car  speed  values),  we  can  conduct  learning-phase  experiments  which 
measure  relative  frequencies,  such  as  one  in  which  all  speeds  and  sensor  frequencies  are 
measured  on  a  1 -second  period  basis;  all  HMM  probabilities  follow  trivially.  This  is  the 
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case  whether  observables  are  distributed  using  a  Categorical  distribution  or  some 
continuous  distribution. 

Consequently,  we  can  deduce  the  workflow  for  developing  the  components  of  the 
architecture  of  Fig.  4,  as  depicted  in  Fig.  7. 


Draw  deterministic 
statechart-assertions 


Identify  set  H 
of  hidden 
events  and  data 
artifacts  in  the 
assertion 


Update/fix 

assertion 

English 

requirement 


Generate 
weighted- 
assertion-code 
for  assertion  (see 
section  6) 


Define  HMM 
states  as 

symbols  of  H 


Run  learning- 
phase 

experiments 
determine  HMM 
parameters 


Perform  probabilistic  RV/RM  based 
on  architecture  of  Fig.  4 


Figure  7.  Workflow  for  developing  the  RV  components  of  Fig.  4. 
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6.  RV  OF  ASSERTIONS  WITH  PROBABILISTIC  INPUTS 

Using  the  architecture  of  Fig.  4,  the  formal  specification  assertion  module 
observes  sequences  that  consist  of  visible  as  well  as  hidden  artifacts;  in  Fig.  1  for 
example,  lightsTurnedRed,  lightsTurnedYellow,  lights TurnedGreen,  timeoutFire,  and 
clockTick  event  are  visible,  while  Speed  is  hidden.  Hidden  artifacts  have  an  associated 
probability  distribution  which  we  call  the  prohahility-of-occurrence  distribution  (POD), 
such  as  POD- 1:  Speed=HIGH,  MED,  LOW  at  time  5  occurs  with  probability  0.72,  0.2, 
0.08,  respectively.  Section  8  describes  three  techniques,  called  a,  y,  and  8",  for 
computing  the  cycle-by-cycle  POD,  based  on  a,  y,  and  8,  respectively.  We  consider  a 
visible  artifact  to  have  a  probability  of  occurrence  of  1 . 

A  weighted/probabilistic  implementation  of  the  statechart  assertion  module  of 
Fig.  4  responds  to  an  input  sequence  I  =  <S\,  P\>,  <Si,  Pi>,-,  <St,  Pt>,  where  St  is  a 
visible  or  hidden  artifact  (i.e.,  event  such  a  clockTick,  or  data  artifact,  i.e.,  variable,  such 
as  Speed,  both  in  Fig.  1),  and  P,  is  the  POD  of  St. 

We  use  the  UML  notation  for  St,  St=eventt[conditiont],  where  condition t  is 
optional;  eventt  and  condition,  can  either  or  both  be  visible  or  hidden. 

An  assertion’s  implementation  consists  of  a  collection  C  of  instances,  or  copies, 
of  the  assertion,  called  configurations.  Each  configuration  executes  as  a  standalone 
assertion  and  preserves  its  own  present-state.  Each  configuration  Con  has  a  probability 
measure  P(Con),  called  the  Configuration  Probability  Measure  (CPM),  that  measures  the 
probability  the  assertion  is  behaving  as  suggested  by  Con,  i.e.,  that  its  present-state  is 
Con’s  present  state.  Upon  startup,  C  consists  of  a  single  configuration  Condefauit  whose 
present-state,  denoted  PS(Con, default),  is  the  assertion’s  default  state  (e.g.,  state  Init  in  Fig. 
1),  and  having  P(Condefau„)=\ . 

All  configurations  of  C  respond  to  a  pair  <S t,Pt>  of  /,  as  follows.  If  P,=  1  then 
the  configuration  performs  a  conventional  state  machine  state  change  upon  input  S,-,  such 
as  SpeedHigh  “^timeoutFire  SpeedHishFor2Min,  in  Fig.  1.  Otherwise,  either  event,  or 
condition,  are  hidden.  In  this  case  the  configuration  Con  is  replaced  with  two 
configurations:  Coni  and  Con2,  whose  present-state  probabilities  are  calculated  as 
follows: 

•  If  event,  is  hidden  (as  discussed  in  section  7)  then  P(Conl )  =  P(Con)*P,  and  P(Con2 ) 
=  P(Con)*(l-Pt). 

•  If  condition,  is  hidden,  then  we  calculate  Picondition ,),  the  probability  of  the 
condition,  as  a  function  of  the  probabilities  of  its  constituent  variables  using  standard 
probability.  For  example,  if  condition,  is  Speed  =  HIGH  ||  Speed  =  MED  then 
Picondition,)  =  P(Speed  =  HIGH)  +  P(Speed  =  MED),  where  each  term  is  taken  from 
the  POD  at  time  t,  such  as  0.72  and  0.2  respectively,  using  POD- 1. 

We  set  P(Conl)=P(Con)P{conditiont),  and  P(Con2)=P(Con)(l-P(conditiont)). 
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Let  PS{Con )  denote  Con’s  present-state.  PS(Conl )  and  PS(Con2)  are  determined 
as  follows: 


•  If  event t  is  hidden  (as  discussed  in  section  7)  then  PS(Conl )  is  the  next  state 
determined  by  the  assertion’s  transition  out  of  PS(Con),  under  the  assumption  that  the 
event  fired,  and  PS(Con2)=  PS(Con). 

•  If  condition t  is  hidden  (e.g.,  Speed== HIGH  condition  in  Fig.  1),  then  PS{Conl )  is 
calculated  assuming  conditiont=true  and  PS(Con2 )  is  calculated  assuming 
conditiont=false, 

For  the  sake  of  simplicity  we  disallow  assertions  in  which  both  eventt  and 
conditiont  are  hidden. 

C  configurations  are  routinely  (i..e,  every  cycle  t)  managed  as  follows.  All 
configurations  Con '  with  the  same  present-state-’  are  merged  into  a  single  configuration 
Conmerged,  using  the  sum  of  all  P(Co« ')  as  P(Conmerged)- 

The  statechart  assertion  declares  a  probability  of  failure  (POF),  i.e.,  the 
probability  its  corresponding  requirement  has  been  violated,  on  a  cycle  by  cycle  basis, 
being  the  sum  of  all  P(Con )  for  all  configurations  Con  such  that  PS(Con)  is  an  error  state. 

Note  that  statechart  assertions  typically  have  error  states  that  are  sink  states,  i.e., 
states  with  no  outgoing  transitions.  For  such  assertions,  the  POF  is  monotonically 
increasing  with  time. 

7.  RV  OF  ASSERTIONS  WITH  HIDDEN  EVENTS 


UML  statecharts,  message  sequence  diagrams  (MSC’s),  and  other  formalisms  are 
intrinsically  event  driven.  In  fact,  the  statechart  assertions  of  Figures  1  and  6a  are  event 
driven,  using  events  such  as  lightsTurnedRed  and  the  1Hz  clockTick  event.  However,  as 
presented  in  section  4,  HMM  symbols  are  propositional  in  nature  -  ,  manifested  as  the 
states  of  the  HMM,  such  as  the  Speed  variable.  Consequently,  the  assertions  of  Figures  1 
and  6a  must  poll  the  Speed  variable  using  the  1Hz  clockTick  event.  In  contrast,  Fig.  8 
depicts  an  event  driven  assertion  for  requirement  Rl;  it  uses  hidden  events 
speedChangedToHIGH  and 


speed ChangedFromHIGH. 


O 


- - \  lights Tume<2RedO.' 


TRTim«outSlmulal«<JT lm«  tim«r_2min  =  nr 
TRU  m«ou»amui3!e<mm«<  1 2omis ); 
TRTimeoutSimulatedTine  timer_30sec  =  n« 
TRTimeoutSiinulatedTimedothls); 


Qctrn 

QOnGomg 

\  sp««dChang»dToHIGMQ/ 

0n-£nM)m«f_2™n  restart  i 

\  sp**dChano*dFfomHI6H  [1/ 

k 

• 

A  timecutfire^ir 

Q  SpeedHighFor2Mm 

On-Ei*riV»m*r_30*rcr«*tartO 

N  »ra*putFirr| 

*0Er,w 

On-EntrymSuccess  1 1 


3  More  accurately,  PS(Cori)  is  an  extended  state  vector,  that  includes  the  state  variable  and  the  states  of 
all  local  variables,  such  as  the  timer  state  and  the  bSuccess  flag. 
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Figure  8.  An  event  driven  statechart  assertion  for  requirement  R1  that  uses  hidden 

events 

The  probability  of  these  two  events  is  induced  by  the  probability  of  an  HMM 
transition  from  state  i  to  state  j  being  traversed  at  time  t,  i.e.,  by  ^t(i,j).  Hence,  their 
probabilities  are: 

1 .  /’(event  speedChangedToHIGH  occurring  at  time  t  \  O,  X)  =  Xi</<  n  mv- 

2.  /(event  speedChangedFromHIGH  occurring  at  time  t  \0,X)  =  Xi</<  n  &(0,j). 

8.  GENERATING  THE  PROBABILITY  OF  OCCURRENCE  OF  A 
HIDDEN  ARTIFACT 

We  propose  three  techniques  for  estimating  the  POD  at  time  t:  the  alpha,  gamma, 
and  delta  methods,  as  follows. 

•  The  alpha  method,  which  uses  N  values  of  a '  t( i)=P( q,=$i |  O  i  CF . .  .Cf,  X),  one  per 
symbol  s;,  1  <i<N.  Note  that  <i<N<x  t(i)  =  1. 

•  The  gamma  method,  which  uses  N  values  of  yt{i)=P{qt=Si\0\02---0T,  X),  one  per 
symbol  s;,  1  <i<N.  Note  that  Xi <i<NYt(l)  =  1. 

•  The  delta  method,  which  uses  N  values  of: 

8t"(0  =  5t '  O'VZ  1  </<v5t '(/'),  where 

5t'(0  =  raax[qi,q2,...  qt-i]P(qi,q2,-qt=^\  0i02...0,  A),  where 
P(qi,q2,...qt=Si\  0i02...0t  ,  A)=S,( z)/P(0 1 02. . . O,) .  In  other  words,  8t' '(/)  is  a 
normalized  version  of  8t  (/  ),  which  in  turn  is  the  probability  of  the  HMM  generating 
symbol  s,  at  time  t,  via  the  most  probable  state  sequence,  given  the  observation. 

The  gamma  method  is  a  backward-forward  algorithm;  it  therefore  requires  the 
entire  observable  sequence  O1O2...O7  for  the  evaluation  of  y,(i)  for  t  <  T.  The  alpha  and 
delta  methods  on  the  other  hand,  are  forward  algorithms  and  therefore  do  not  require 
future  information  for  the  calculation  of  a't(i)  and  8t"(0-  Nevertheless,  scaling  issues 
discussed  below  effectively  imply  that  no  matter  what  method  is  used,  it  can  only  be  used 
verbatim  with  a  limited  number  of  observables.  In  section  10  we  suggest  a  remedy  to  this 
limitation. 

When  the  HMM  contains  transitions  with  probability  0,  then  all  three  methods 
might  induce  sequences  of  symbols  that  cannot  be  physically  generated.  For  example, 
consider  an  an  HMM  with  N=  3  and  ai;2=0,  and  suppose  y;(  l  )=0.3  and  yf+i(2)=0.2;  The 
assertion  then  cosiders  the  sequence  s 7,  S2  as  possible,  having  a  positive  probability  of 
0.06. 


All  three  methods  suffer  from  inherent  scaling  problems,  because  the  calculation 
of  a't(i),  y(z),  and  8t"(0  generate  values  that  scale  down  geometrically  with  t.  There  are 
published  numerical  techniques  designed  to  mitigate  this  problem  [Ma];  nevertheless,  this 
constraint  limits  the  length  of  the  sequence  of  observables  O1O2...O7,  and  its 
corresponding  sequence  /  =  <S\,  P\>,  <Si,  P2>,.;  <St,  Pt>  of  assertion  inputs. 
Meanwhile,  the  RV  process,  in  of  as  itself,  is  not  necessarily  limited  in  duration,  and 
might  continue  working  for  intervals  longer  than  T. 
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A  straight-forward  solution  to  the  scaling  problem  is  to  perform  RV  using  a 
sequence  of  frames  of  observables  of  length  T,  where  the  probability  measurement  values 
computed  at  time  T  (i.e.,  a 'j(i),  y j{i),  or  6t"(0)  °f  a  certain  frame  are  used  as  7t;  for  the 
following  frame.  This  approach  however,  introduces  an  error  or  noise  every  time  we 
reload  the  frame  buffer. 

To  circumvent  this  problem,  we  propose  the  following  smoothing  approach  in 
which  we  use  two  partially  overlapping  buffers  of  observables  of  length  T.  Buffer  B\ 
contains  observables  0„+i0„+2...0„+7’,  while  buffer  Bi  contains  observables 
0m+\0m+2...0m+T,  where  m=n+T/;  in  other  words  B\[t]=B2[t+T/2]  if  t  <  772  and 
B\[t]=B2[t-T/2]  otherwise.  This  is  applied  repeatedly  for  frames  «=0,l  ,2,..,  where  the 
roles  of  Bl  and  B2  alternate.  Now  suppose  we  are  using  the  gamma  method;  we  apply  it 
to  each  buffer,  resulting  in  y  t(t)  and  y  if  t<T/2,  and  y  ,(/)  and  y  t-mif)  otherwise. 

Finally  use  the  average  of  these  two  y  values  as  our  actual  y t{t)  using  a  weighted  average 
that  weighs  the  y \{t)  value  that  is  closer  to  the  center  of  its  buffer  more  that  the  one  that  is 
farther  away  from  the  center  of  its  buffer: 

y,(t)  =  (2 1  y \{t)  +  (T-2t)  y2t+T/2(t))/T,  if  t<  772 
Y v(0  =  ((2T-2t)  y \(t)  +  (2t-T)  y2t+T/2(t))/T,  otherwise. 

In  future  research  we  will  conduct  experiments  that  measure  the  deviation  of 
a'  j{i),  yd/),  and  8 t"(/)  from  their  true  values  when  this  method  is  used. 

9.  RV  OF  ASSERTIONS  WITH  HIDDEN  CONTINUOUS  DATA 

While  requirement  R1  asserts  about  vehicle  speed  greater  than  42km/h  in  the 
Main  direction,  the  matching  statechart  assertion  of  Fig.  1  asserts  about  Speed  values 
being  one  of  the  symbols  (HMM  states)  HIGH,  MED,  or  LOW;  as  a  consequence,  the 
task  of  matching  HMM  states  to  vehicle  speed  values  becomes  the  TLC’s  HMM 
designer’s  responsibility,  while  this  is  actually  a  requirement  vs.  assertion  matching  issue. 

An  additional  drawback  of  this  approach  is  that  the  random  variable  being 
asserted  about  {Speed,  in  the  TLC  case)  typically  has  a  more  complex  distribution  than 
the  simplistic  Categorical  distribution. 

Suppose  that  TLC  Speed  is  not  the  HMM  state,  but  a  random  variable  associated 
with  the  state,  one  with  a  continuous  distribution  such  as  a  normal  distribution.  The  Table 
3  example  lists  the  parameters  of  the  Speed  random  variable  distribution  for  the  TLC 
example. 


State: 

HIGH 

MED 

LOW 

Name  of 

distribution 

F0 

Fi 

f2 

p  (km/h) 

40 

28 

14 

a  (km/h) 

12 

15 

10 

Table  3.  Normal  distribution  parameters  of  the  Speed  random  variable  TLC  states. 
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Using  this  framework,  we  can  now  use  a  variant  of  the  assertion  of  Fig.  1  that 
uses  transition  conditions  Speed>42  and  Speed< 42  instead  of  SpeechYUGW  and 
Speed\=R\GR,  respectively,  thus  addressing  the  letter  of  requirement  Rl. 

Let  Speed(t,i)  denote  a  random  variable  (r.v.)  representing  Speed  when  the  HMM 
is  in  state  i  at  time  t.  We  assume  its  distribution  is  time  independent,  and  therefore  write 
Speed(i);  its  cumulative  distribution-function  (CDF)  is  Fspeed(i)ispeed)  =  P(Speed<speed  \ 
qt=Si,  A).  We  also  make  the  following  counter-intuitive  assumption:  Speed(i)  is 
independent  of  the  observables  (sensor  frequency  measurements),  given  the  present  state 
Sj.  It  is  counter  intuitive  because  after-all,  vehicle  speed  seem  to  depend  on  those 
frequencies.  Nevertheless,  the  dependence  is  totally  manifested  by  the  qt=Si,  and  given 
that,  Speed(i)  is  independent  of  the  observables. 

We  now  define  modified  variables  a,  (3,  and  y,  as  expressions  rather  than  literal 
numbers,  as  follows: 

•  o?  speed,  t(speed,  i)=P(0\02...0h  Speed^peed,  <y,=s,|A). 

Clearly, 

cc  Speed,  tispeed,  /) 

P(0i02...0?,  q,  =  s,  |  A)  P{Speed<speed  |  0|02...0,y/,=s,  ,A)  = 
a,{i)  PiSpeed<speed  \  q,  =  s,  ,A), 

the  last  equality  results  from  Speed(i)  being  independent  of  the  observables. 

Hence: 

a  speed,  tispeed,  i)  =  afi)  FSpeed(i)(speed). 

The  normalized  version,  a0  is: 

a0'  speed,  t(speed,  i )  =  a',(i)  FSpeed(i)ispeed). 

•  (3° speed, t{speed,  i)  =  -P(Ot+i Ot+2 . . . Ox,  Speed(i)<speed  \  qt= S;,A).  As  in  the  case  of  a, 

^  speed, tispeed, i)=fi,ii)FSpeed(i)ispeed). 

•  Y  Speed,  tH)=  Pi  Speed  ( i)  <speed,  q,  =  s,  \  0|...07,  A)  =  a,(i)  (3,(0  FSpeed(i)ispeed)  / 

R(Oi...Or). 

Rhe  RV  process  of  section  6  is  modified  as  follows.  In  addition  to  using  the  alpha 
or  gamma  methods  to  calculate  a  Categorical  POD  for  HMM  states  such  as  POD- 1,  we 
calculate  a°  or  y°,  respectively,  using  an  instance  value  of  speed  (e.g.,  speed  =  42)  based 
on  the  assertion.  More  specifically,  given  an  RV  computation  Con,  the  calculation  of 
P(Conl)  and  P(Con2)  discussed  in  section  6  is  modified  as  follows: 

•  If  event t  is  hidden,  the  calculation  is  unchanged,  because  the  probability  of  a  transition 
being  traversed  only  depends  on  states  and  observations,  not  on  the  Speed  variable.  In 
other  words,  Speed  only  pertains  to  conditions  in  the  assertion  statecharts,  not  events. 

•  If  conditiont  is  hidden,  as  in  Speed<A2  in  the  modified  assertion  of  Fig.  1,  then  we 
calculate  Piconditiont ),  the  probability  of  the  condition,  by  evaluating  the  expected 
value  of  a'  i(42,  i )  namely, 

Zi<  i <n  a  tif)  F sPeed (i) id 2 )  for  the  alpha  method,  or  the  expected  value  of  f't(42, 
i )  namely,  £i<;<v  y,(0  FSpeed(i)i42 ),  for  the  gamma  method. 
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We  set  P(Conl)=P(Con)P(conditiont),  and  P{Con2)=P{Con )  (1- 

P(conditiont )),  as  in  section  6. 

10.  A  COMPARISON  OF  RV  ARCHITECTURES 

We  considered  the  following  two  architectures  for  RV  of  systems  with  hidden 
information,  in  addition  to  the  weighted-probabilistic  assertion  architecture  of  Fig.  4: 

The  first  alternative  architecture,  denoted  the  deterministic  assertion  architecture, 
resembles  that  of  Fig.  4,  but  has  the  HMM  connected  to  a  purely  deterministic  formal- 
specification  assertion,  instead  of  a  weighted  probabilistic  module  described  in  section  6. 
In  other  words,  this  architecture  is  the  architecture  of  Fig.  4  where  the  Formal 
Specification  Assertion  block  implements  assertions  using  a  conventional  deterministic 
implementation,  such  as  the  one  described  in  [Dl], 

Because  this  approach  uses  a  deterministic  assertion,  it  can  only  use  a  single 
sequence  of  input  symbols  from  the  HMM,  such  as  the  sequence  ai,a2,...,ar  where  a,= 
maxi</<5v(8;(/)).  However,  the  following  example  demonstrates  the  weakness  of  this 
approach. 

Consider  the  TLC  scenario  depicted  in  Fig.  9a.  Using  the  above  mentioned  single 
sequence  method  it  induces  the  sequence  seq\  of  hidden  states  depicted  in  Fig.  9b,  with 
probability  Pl=8j(seqi)=9. 677258 147046034E-7.  This  sequence  conforms  to 
requirement  R1  because  it  does  not  contain  two  consecutive  minutes  of  Speed= HIGH 
while  lights  are  green. 

In  contrast,  the  sequence  seqj  of  state  symbols  depicted  in  Fig.  9c  violates  Rl.  It  is 
not  generated  by  the  single  sequence  method  because  its  probability  is  P2=6i{seq2)= 
4.63973 1359753094E-11  is  smaller  than  PI. 

The  alpha  and  gamma  methods  are  capable  of  generating  the  later  sequence, 
thereby  enabling  our  suggested  weighted-assertion  architecture  to  detect  the  violation  of 
Rl  failure  with  a  non  zero  probability.  Fig.  10  shows  the  distribution  of  a  and  y  for  seq\ 
and  seq2.  Note  how  the  a'  method  generates  identical  distributions  when  the  observation 
sequences  seq\  and  seqi  agree,  because  when  the  observation  sequences  agree  on  the 
HMM  state  qt=st,  then  given  that  state,  the  observation  O,  is  independent  of  prior 
observations  and  states.  In  contrast,  the  y  method  depends  on  future  observations  too, 
and  Of  is  not  necessarily  independent  of  those. 

I  freq=l _ I  freq=2 _ I  freq=3 _ I  freq=2 _ I  freq=3 _ I  freq=4 _ ^ 


0  16  20  95  100  195  200  Tjme 

a.  Timeline  diagram  of  sequence  of  a  observables  O 
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Time 


Speed  =a. HIGH  I  Speed=a.LOW  I  Speed=a.MED  I  Speed=a.HIGH  I  Speed=a.HIGH 


0  1  16  20  90  200 

b.  Timeline  diagram  of  seq\,  the  most  probable  state  sequence  that  explains  O  according 
to  the  single  sequence  method.  This  sequence  conforms  to  requirement  R1 

|  Speed=a.HIGH  |  Speed=a.LOW  i  Speed=a.MED  |  Speed=a.HIGH  |  Speed=a.MED  |  Speed=a.HIGH 


0  1  16  20  95  100  210  Tjme 

c.  Timeline  diagram  of  seq2,  a  less  probable  state  sequence  due  to  the  time  interval 
[95,99];  this  sequence  violates  requirement  R1 

Figure  9.  Scenarios  that  discriminate  between  the  weighted-probabilistic  assertion 
architecture  and  the  deterministic  assertion  architecture 
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a.  The  distribution  of  a'  for  seq\ 
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b.  The  distribution  of  a'  for  seq2 
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c.  The  distribution  of  y  for  seq\ 


100 

0.75 


0.25 

0.00 


d.  The  distribution  of  y  for  seq2 


Figure  10.  The  distribution  of  a'  and  y  for  seq\  and  seq2 

The  second  alternative  architecture,  denoted  the  monolithic  architecture ,  contains 
no  standalone  RV  module.  Rather,  the  HMM  itself,  being  a  probabilistic  state  machine, 
performs  the  RV  tasks. 
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With  this  approach,  the  assertions  are  combined  with  the  symbol  decoding  HMM 
inducing  a  much  larger  HMM. 

Two  primary  drawbacks  of  this  approach  are: 

a.  The  overall  RV  system  is  hard  to  read  and  maintain;  with  no  separation  of  concerns 
within  the  HMM,  it  is  effectively  performing  two  distinct  jobs:  (i)  decoding  hidden 
symbols  from  visible  ones,  and  (ii)  monitoring  or  verifying  a  requirement  such  as  R1  or 
R2. 

b.  The  HMM  is  the  monolithic  architecture,  being  larger  and  harder  to  read,  will  be 
harder  to  learn  using  the  experimental  approach  discussed  in  section  5. 

11.  CONCLUSION 

We  have  demonstrated  a  technique  for  performing  RV  in  the  presence  of  hidden 
evidence.  We  plan  on  applying  this  technique  to  the  verification  of  aerospace 
applications,  in  which  the  evidence  is  provided  as  telemetry  files  that  often  do  not  contain 
the  artifacts  asserted  about  by  the  formal  specifications.  We  also  plan  on  applying  this 
technique  to  automatic  pattern  detection  within  large  volumes  of  cyber  data,  in  an  effort 
to  identify  malicious  or  dangerous  behavioral  patterns. 

We  are  currently  building  a  special  StateRover  code-generator  that  generates 
weighted/probabilistic  implementation  code  for  statechart  assertions. 

Considering  the  TLC  example,  one  might  wonder  how  the  TLC  itself  is 
implemented  to  conform  with  requirement  Rl,  given  that  the  Speed  variable  is  hidden.  In 
other  words,  wouldn’t  TLC  developers  face  the  same  difficulties  when  implementing  the 
TLC  as  the  quality  assurance  team  faces  when  asserting  about  it?  Indeed,  in  on-going 
research  we  are  investigating  the  use  of  the  proposed  technique  for  controllers  that 
operate  in  difficult  environments  where  some  of  the  inputs  are  not  directly  observable,  as 
often  the  case  in  hostile  environment. 
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